Sampled fictitious play for approximate dynamic programming

نویسندگان

  • Marina A. Epelman
  • Archis Ghate
  • Robert L. Smith
چکیده

Sampled Fictitious Play (SFP) is a recently proposed iterative learning mechanism for computing Nash equilibria of non-cooperative games. For games of identical interests, every limit point of the sequence of mixed strategies induced by the empirical frequencies of best response actions that players in SFP play is a Nash equilibrium. Because discrete optimization problems can be viewed as games of identical interests wherein Nash equilibria define a type of local optimum, SFP has recently been employed as a heuristic optimization algorithm with promising empirical performance. However there have been no guarantees of convergence to a globally optimal Nash equilibrium established for any of the problem classes considered to date. In this paper, we introduce a variant of SFP and show that it converges almost surely to optimal policies in model-free, finite-horizon stochastic dynamic programs. The key idea is to view the dynamic programming states as players, whose common interest is to maximize the total multi-period expected reward starting in a fixed initial state. We also offer empirical results suggesting that our SFP variant is effective in practice for small to moderate sized model-free problems. ∗Industrial and Operations Engineering, University of Michigan, Ann Arbor; [email protected] †Industrial and Systems Engineering, University of Washington, Seattle; [email protected] ‡Industrial and Operations Engineering, University of Michigan, Ann Arbor; [email protected]

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fictitious Play Approach to Large-Scale Optimization

In this paper we investigate the properties of the sampled version of the fictitious play algorithm, familiar from game theory, for games with identical payoffs, and propose a heuristic based on fictitious play as a solution procedure for discrete optimization problems of the form max{u(y) : y = (y1, . . . , yn) ∈ Y1 × · · · × Yn}, i.e., in which the feasible region is a Cartesian product of fi...

متن کامل

Sampled Fictitious Play is Hannan Consistent

Fictitious play is a simple and widely studied adaptive heuristic for playing repeated games. It is well known that fictitious play fails to be Hannan consistent. Several variants of fictitious play including regret matching, generalized regret matching and smooth fictitious play, are known to be Hannan consistent. In this note, we consider sampled fictitious play: at each round, the player sam...

متن کامل

ar X iv : 1 61 0 . 01 68 7 v 1 [ cs . G T ] 5 O ct 2 01 6 Sampled Fictitious Play is Hannan Consistent

Fictitious play is a simple and widely studied adaptive heuristic for playing repeated games. It is well known that fictitious play fails to be Hannan consistent. Several variants of fictitious play including regret matching, generalized regret matching and smooth fictitious play, are known to be Hannan consistent. In this note, we consider sampled fictitious play: at each round, the player sam...

متن کامل

Parameter-Free Sampled Fictitious Play for Solving Deterministic Dynamic Programming Problems

Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the journal title. However, use of a template does not certify that the paper has been accepted for publication in the named journal. INFORMS journal templates are for the exclusive purpose of submitting to an INFORMS journal and should not be used to distribute the papers in print ...

متن کامل

Sampled fictitious play for multi-action stochastic dynamic programs

We introduce a class of finite-horizon dynamic optimization problems that we call multiaction stochastic dynamic programs (DPs). Their distinguishing feature is that the decision in each state is a multi-dimensional vector. These problems can in principle be solved using Bellman’s backward recursion. However, complexity of this procedure grows exponentially in the dimension of the decision vect...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computers & OR

دوره 38  شماره 

صفحات  -

تاریخ انتشار 2011